Your browser doesn't support javascript.
loading
Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data.
Uddin, Shahadat; Lu, Haohui.
Affiliation
  • Uddin S; School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, NSW, Australia.
  • Lu H; School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, NSW, Australia.
PLoS One ; 19(4): e0301541, 2024.
Article in En | MEDLINE | ID: mdl-38635591
ABSTRACT
Many individual studies in the literature observed the superiority of tree-based machine learning (ML) algorithms. However, the current body of literature lacks statistical validation of this superiority. This study addresses this gap by employing five ML algorithms on 200 open-access datasets from a wide range of research contexts to statistically confirm the superiority of tree-based ML algorithms over their counterparts. Specifically, it examines two tree-based ML (Decision tree and Random forest) and three non-tree-based ML (Support vector machine, Logistic regression and k-nearest neighbour) algorithms. Results from paired-sample t-tests show that both tree-based ML algorithms reveal better performance than each non-tree-based ML algorithm for the four ML performance measures (accuracy, precision, recall and F1 score) considered in this study, each at p<0.001 significance level. This performance superiority is consistent across both the model development and test phases. This study also used paired-sample t-tests for the subsets of the research datasets from disease prediction (66) and university-ranking (50) research contexts for further validation. The observed superiority of the tree-based ML algorithms remains valid for these subsets. Tree-based ML algorithms significantly outperformed non-tree-based algorithms for these two research contexts for all four performance measures. We discuss the research implications of these findings in detail in this article.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Algorithms / Machine Learning Limits: Humans Language: En Journal: PLoS One Journal subject: CIENCIA / MEDICINA Year: 2024 Document type: Article Affiliation country: Country of publication:

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Algorithms / Machine Learning Limits: Humans Language: En Journal: PLoS One Journal subject: CIENCIA / MEDICINA Year: 2024 Document type: Article Affiliation country: Country of publication: